Easy R scripts for Two-Stage Least Squares, Instruments, Inferential Statistics and Latent Variables

نویسندگان

  • Douglas R. White
  • Ren Feng
  • Giorgio Gosti
  • Tolga Oztan
چکیده

The two-stage least-squares inferential statistics (2SLS-IS) scripts in R provided here help to discriminate the quality of results in regression models, improving on prototypes by Dow (2007) and Eff and Dow (2009) that implemented a new two-stage least squares (2SLS) standard for regression models with missing data imputation and controls for autocorrelation. Major 2SLS-IS modeling improvements with these inferential statistics (-IS) scripts are threefold. First, they provide a relative effect (reff) measure, analogous to use of percentaged variables in regression modeling (Fox 2002:27) and linear transformation of the regression coefficient that scales uniformly for comparing strengths of each variable alongside iid significance tests. Second, they evaluate competing variables within regression models by inferential statistics derived from random subsamples of observations to estimate model coefficients that are then used in remaining independent subsamples to test resilience of significance levels. Third, they optimize imputation of missing data by an option to impute missing observations on all variables and all cases (Rubin 1996) rather than just those for which the dependent variable is coded (Eff and Dow 2009, Dow 2007). Fourth, they link to latent variables and structural equation models (SEM) and will include the hierarchical partitioning of variance widely used for 2SLS (Brown and Eff 2010) and White and Lu’s (2010) Hausman tests for model robustness. Each of these features helps to provide inferential statistics to evaluate regression and causal modeling, and to output results useful for analyzing networks of causal, bias, and control. The 2SLS-IS scripts and planned add-ons are modular, currently in three source files, only one of which requires editing by the user to define a new model or to change the database in *.Rdata format. The advantage of working in the R computing environment, a widely used, cooperatively developed collection of free open-source software, is the flexibility not only of executing programs but writing or incorporating functions that operate on data or output of other scripts or databases. The 2SLS-IS scripts adapted to the Standard Cross-Cultural Sample (SCCS) build on James Dow’s (2004) SCCS.Rdata and his editing of missing value codes to conform to R, and prototypes (by Dow 2007, Eff and Malcolm Dow 2009, and Brown and Eff 2010) for missing data imputation and efficient estimates of significance using 2SLS for regression models with autocorrelation controls. Potential collaborators or users and developers can communicate in further open-access program developments. Importing documented packages from the extensive R package archives make it easy to extend these benefits and for new authors to successively improve software. 1 – Introducing regression models with controls for autocorrelation : 2SLS and 2SLS-IS Outline 1 – 1 The Basis of the 2SLS-IS Scripts 1 – 2 Why model? Causal Modeling Advances and Structural Models 2 – The 2SLS-IS scripts 2 – 1 Causal Modeling Advances with Relative δy/δx Effects and Variable Inflation Factors (vifs) 2 – 2 Replication: Comparing Output of Eff and Dow’s 2SLS with 2SLS-IS 2 – 3 Resilience: Using the Random Inferential Subsample Training Ratio 3 – Robustness tests of prototype models and identif ication of improved models 3 – 1 Latent Variables: Moral Gods (“Hidden variables”) 3 – 2 Competition among Models using Significance Tests and Relative δy/δx Effects 4 – Networks of Variables , 2SLS-IS Regression, and SEM 4 – 1 Direct and Indirect Variables, correcting for biased subsamples, and networks of variables 4 – 2 Structural arrows and SEM: What can we learn about theory from Graphical language? 4 – 3 Exogeneity and Multiple Imputation 4 – 4 No Free Lunch: A coupled 2SLS-IS and SEM approach to networks of variables Appendix 1 : Mathematical intuitions 5 – Conclusions Appendix 2 : Inferential Statist ics Tables To do: relaimpo, Hal White/Lu Hausman test, option for indep_vars not subject to robustness tests, dot graphs that run sem models from EQS-type commands, Kyono Commander in R for EQS output, autocorrelation coeffs per WX autocorrelation variable), effective sample sizes each independent variables, and use of scale() to normalize variables for probit. Option to restrict X in WX to the restrict_vars to default to the full set of independent variables. Use separate WX IVs from WYWX.Rdata to estimate sem models (Fox 2006). Do that run with FxCmty fully imputed & WYWX.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partial least squares methods: partial least squares correlation and partial least square regression.

Partial least square (PLS) methods (also sometimes called projection to latent structures) relate the information present in two data tables that collect measurements on the same set of observations. PLS methods proceed by deriving latent variables which are (optimal) linear combinations of the variables of a data table. When the goal is to find the shared information between two tables, the ap...

متن کامل

Censored Regression Quantiles with Endogenous Regressors

This paper develops a semiparametric method for estimation of the censored regression model when some of the regressors are endogenous (and continuously distributed) and instrumental variables are available for them. A “distributional exclusion” restriction is imposed on the unobservable errors, whose conditional distribution is assumed to depend on the regressors and instruments only through a...

متن کامل

Partial least squares regression and projection on latent structure regression (PLS Regression)

Partial least squares (pls) regression (a.k.a projection on latent structures) is a recent technique that combines features from and generalizes principal component analysis (pca) and multiple linear regression. Its goal is to predict a set of dependent variables from a set of independent variables or predictors. This prediction is achieved by extracting from the predictors a set of orthogonal ...

متن کامل

Instrument Selection by First Stage Prediction Averaging∗

This paper considers model averaging as a way to select instruments for the two stage least squares and limited information maximum likelihood estimators in the presence of many instruments. We propose averaging across least squares predictions of the endogenous variables obtained from many different choices of instruments and then use the average predicted value of the endogenous variables in ...

متن کامل

Process Modeling by Bayesian Latent Variable Regression

Process Modeling by Bayesian Latent Variable Regression Mohamed N. Nounou, Bhavik R. Bakshi Prem K. Goel, Xiaotong Shen Department of Chemical Engineering Department of Statistics The Ohio State University, Columbus, OH 43210, USA Abstract Large quantities of measured data are being routinely collected in a variety of industries and used for extracting linear models for tasks such as, process c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012